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Abstract 

Background: Despite many efforts, little is known about distribution and interactions of chromatin proteins which 
contribute to the specificity of chromomeric organization of interphase chromosomes. To address this issue, we 
used publicly available datasets from several recent Drosophila genome-wide mapping and annotation projects, in 
particular, those from modENCODE project, and compared molecular organization of 13 interband regions which 
were accurately mapped previously. 

Results: Here we demonstrate that in interphase chromosomes of Drosophila cell lines, the interband regions are 
enriched for a specific set of proteins generally characteristic of the "open" chromatin (RNA polymerase II, CHRIZ 
(CHRO), BEAF-32, BRE1, dMI-2, GAF, NURF301, WDS and TRX). These regions also display reduced nucleosome 
density, histone H1 depletion and pronounced enrichment for 0RC2, a pre-replication complex component. Within 
the 13 interband regions analyzed, most were around 3-4 kb long, particularly those where many of said protein 
features were present. We estimate there are about 3500 regions with similar properties in chromosomes of D. 
melonogoster cell lines, which fits quite well the number of cytologically observed interbands in salivary gland 
polytene chromosomes. 

Conclusions: Our observations suggest strikingly similar organization of interband chromatin in polytene 
chromosomes and in chromosomes from cell lines thereby reflecting the existence of a universal principle of 
interphase chromosome organization. 



Background 

Genetic activity of interphase chromosomes is intimately 
linked to the properties of chromatin organization. At a 
very basal level, chromatin is organized in nucleosomes, 
histone octamere/DNA complexes. These, in turn, form 
higher-order structures, such as chromomeres, loops, 
domains, etc. Clearly, key to this organization are the 
chromatin proteins: histones, their post-translational 
modifications, and non-histone proteins. Modern meth- 
ods help reliably address the question of interphase 
chromatin organization at a nucleosomal level, however 
details of higher-order chromatin organization still 
remain obscure. This is largely due to our inability to 
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directly visualize the supra-nucleosomal structures in 
diploid interphase nuclei. Giant polytene chromosomes 
from dipterans, in particular from Drosophila, allow one 
to mitigate this problem. 

"Classic" polytene chromosomes from larval salivary 
glands of D. melanogaster are composed of bundles of 
one to two thousand tightly synapsed chromosomal 
strands, which are formed via multiple rounds of endor- 
eplication of just two starting chromatids. As all the 
homologous chromomeres from all chromatids are 
aligned to each other, this results in the formation of a 
thick "cable" with transverse stripes of compacted chro- 
matin (bands) alternating with decompacted interchro- 
momeric regions (interbands). Local differences in size 
and compaction of banded material form a unique 
banding pattern that can be used to accurately map any 
polytene chromosome region. This in turn allows one to 
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link a particular DNA sequence, genes and proteins to 
the specific chromosomal region, and so to spatially 
analyze the genetic processes taking place in the inter- 
phase nucleus (for review: [1]). 

According to different estimates, there are 3500-5000 
bands and interbands in Drosophila melanogaster poly- 
tene chromosomes; these comprise about 95% and 5% 
of euchromatic DNA, respectively. On average this cor- 
responds to 30 kb of genomic material per band and 2 
kb per interband [2-4]. Obviously, the vast majority of 
genes are situated in bands, as they encompass most of 
the DNA. As a rule, the degree of chromatin compac- 
tion in bands correlates with their transcriptional activ- 
ity. This is manifested most clearly in case of puffing, i. 
e. when upon gene activation bands form puffs. Despite 
the fact that interbands are also represented by decom- 
pacted chromatin, their genetic organization and func- 
tions are still largely enigmatic. Several hypotheses 
regarding the functions of interbands were put forward 
in the literature (for review: [4]), but can be essentially 
reduced to just two alternatives. Namely, the interbands 
correspond to active genes. Or, interbands harbor regu- 
latory regions for genes that are found in the neighbor- 
ing bands. Neither of these scenarios had been 
adequately addressed experimentally. 

In light microscope, many decompacted regions 
appear as interbands, however upon closer examination 
at an EM-level they in fact comprise series of faint 
bands. Thus, of the regions typically considered inter- 
band-like, only some are true interbands. Presently it is 
well-known that numerous "open chromatin" proteins 
are typically found in such decompacted regions. For 
instance, these are different forms of RNA polymerase II 
[5,6], including the paused RNA polymerase II [7,8] 
which is necessary for transcription initiation; these are 
proteins and protein complexes involved in transcrip- 
tional elongation: SPT4, SPT5, SPT6, TFIIH, dMEDIA- 
TOR, dELL [8-11]. Likewise, these regions frequently 
contain nucleosome remodeling and histone-modifying 
proteins: CHD1 [12], JIL-1 [13], BRM [14], COHESIN 
[15], TRX [16], WDS [17], H2B monoubiquitinating 
enzyme BRE1 [18], and NURF, which increases accessi- 
bility of chromatin templates [19]; they harbor histone 
variants: H4K16ac [20], H3K9ac, H3K14ac [21], 
H3K4me3 [22]. Furthermore, insulator proteins BEAF- 
32 [23] and GAF [24] as well as pre-replication complex 
protein ORC6 [25] are also found in many decompacted 
regions of polytene chromosomes. Finally, there are at 
least two interband-specific and interacting proteins, Z4 
and CHRIZ (CHROMATOR), however their functions 
in interbands are presently unknown [26,27]. 

Despite this plethora of interesting chromatin proteins 
linked to interbands, their very cytological mapping is 
not accurate enough, as it is quite challenging to reliably 



map the protein localization signal to a fine structure of 
an interband, at least at the resolution level of light 
microscopy. 

Clearly, in order to address the functions of inter- 
bands, it is important to be able to accurately map inter- 
band regions on a physical map and then to analyze the 
protein binding profiles and chromatin features in these 
regions. Unfortunately, using standard mapping techni- 
ques, it is close to impossible to precisely map DNA 
sequences to interbands as their axial lengths are quite 
small (0.12 mkm on average) [2]. To solve this problem, 
one must develop new approaches to mark and identify 
interband regions. P-element insertions could serve as 
such useful "markers". Using electron microscopy (EM) 
analysis of polytene chromosomes from stocks with P- 
element-based insertions, our group has previously 
shown that such insertions can be visualized on polytene 
chromosomes as distinct cytological structures [28,29]. 
In most cases, transcriptionally silent chromatin in such 
transgenes becomes compacted and forms novel bands, 
provided that insertions occurred into interbands. When 
inserted into bands, the compacted material from a 
transgene typically fuses with the neighboring material 
and does not form a separate band (Figure 1). As the 
transgene sequence is known, cloning the DNA 
sequence adjacent to the transgene insertion is straight- 
forward, and so one can unambiguously identify the 
sequences that belong to interbands [30-32]. 

Using this approach, we mapped and cloned the DNA 
from 13 interband regions. We found that these inter- 
bands were mainly composed of non-coding intergenic 
regions and 5'-UTRs. Also, many of the interbands were 
rich in DNase I hypersensitive sites (DHSs), which 
turned out to behave as "hot spots" for integration of P- 
element based transgenes [33]. 

With these observations in hands, we decided to 
further explore the question of functional organization 
of interbands. First of all, we wanted to establish which 
proteins were specific to the interbands' open chroma- 
tin, and then to ask whether localization of some of 
these proteins could be correlated on a genome-wide 
scale. Obviously it was of utmost importance also to 
understand whether the interbands from polytene chro- 
mosomes were "mirrored" by analogous regions in chro- 
mosomes from cell lines. Also, in order to address the 
question of existence of a defined molecular border 
between bands and interbands, it was interesting and 
necessary to estimate the length of DNA sequences 
associated with such proteins. To tackle all these ques- 
tions, we analyzed the data from Drosophila genome- 
wide protein mapping databases, mostly those from 
NHGRI modENCODE project [34] and from Filion with 
co-authors [35]. These projects included comprehensive 
genome-wide analysis of a wide array of chromatin 
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Figure 1 Morphology of P-element insertions in polytene chromosomes. Possible scenarios: A - transgenic insertion into the interband 
results in formation of a novel band; B - electron microscopy image of the region 84E from chromosome arm 3R of wild-type (top) and 
transgenic for cHBA (bottom) larvae. Transgenic material forms a novel band (black arrow), which is absent from the chromosomes in control 
stock (white arrow); C - transgenic insertion does not result in formation of a novel band; D - electron microscopy image of the region 12E of 
chromosome X from wild-type (top) and cHBA transgenic (bottom) larvae. Chromosome morphology remains unaltered (black arrow) in the 
transgenic strain as compared to the wild-type chromosome (white arrow). Some marker bands are shown by arrowheads. Bar corresponds to 1 
mkm. 



proteins and histone modifications from D. melanoga- 
ster cell lines. As a result, 5 [35], 9 and even 30 [36] dis- 
tinct chromatin types were identified, which were 
characterized by specific combinations of classes of 
genes and associated proteins. 



Using the abovementioned data obtained on inter- 
phase chromosomes of cell lines, in the present work we 
performed comparative analysis of thirteen interband 
regions from polytene chromosomes searching for the 
proteins specifically enriched in interbands. Vast 
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majority of interbands studied was found to associate 
with a set of proteins that is typically found in open 
chromatin. These open chromatin proteins tended to 
localize to low nucleosome density and histone Hl- 
depleted regions and to correlate with binding of ORC2, 
a pre-replication complex protein. Our data suggest that 
regions possessing most of these features combined are 
typically smaller than 3-4 kb in length, and that the 
number of such regions closely matches the estimated 
number of cytologically distinct interbands in polytene 
chromosomes. Furthermore, our data demonstrate that 
interband chromatin is similarly organized in different 
cell types, thereby suggesting its participation in general 
processes that serve to form and maintain the functional 
architecture of interphase chromosomes. 

Results 

Open chromatin proteins and histone marks are found in 
the cell line chromosome regions that correspond to 
polytene chromosome interbands 

Distribution profiles for several dozens of proteins and 
histone marks in several D. melanogaster cell types have 
been established through the efforts of modENCODE 
project [34]. We used these data and other chromatin 
features and focused on the regions that correspond to 
13 previously mapped interband regions from polytene 
chromosomes [31,33]. Specifically, we used modEN- 
CODE ChlP-chip datasets for S2 cells and in some 
instances Kcl67 cells, which were generated for 18 his- 
tone modifications and 25 chromatin proteins belonging 
to different functional classes. Notably, band/interband 
transition points remain presently unknown, and inter- 
band size estimates also vary quite widely from 0.3 to 
over 3.8 kb [1,37]. Thus, we compared binding profiles 
for these proteins over 10 kb regions centered around 
insertion sites of reference transgenes which were 
mapped to the interbands studied and used to clone 
respective DNA sequences (Additional file 1 Figure SI, 
Additional file 2 Table SI). Figure 2 illustrates that in 
cell lines most of the 13 regions analyzed (80-100%) 
associate with open chromatin proteins. Notably, most 
of these proteins show significantly lower levels of the 
distribution in control sets of random DNA sequences 
of equal size from the D. melanogaster genome or from 
three large molecularly mapped bands 10A1-2, 75C1 
and 75C2 (Figure 2, Additional file 2 Table S4) [38,39]. 
Of these open chromatin proteins, RNA polymerase II, 
CHRIZ, ORC2, GAF, BEAF-32, CP190, TRX, as well as 
H3K9ac, H4K16ac and H3K4me3 were previously 
reported to partially or completely immunolocalize to 
interbands (for review: [4]). The rest of the proteins - 
WDS, dMI-2, NURF301, BRE1, H3K4me2/3 and 
H4K16ac were known to contribute to chromatin remo- 
deling and transcriptional regulation. We failed to 



observe H3K4me3-LP and tetra-H4ac in interband 
regions, even though these histone marks were reported 
as present in transcriptionally active chromatin (Supple- 
mentary Figures 11-12 from [36]. We attribute this to 
the quality of H3K4me3-LP antibody: despite H3K4me3 
(affinity-purified) and H3K4me3-LP (crude serum) show 
overall very similar distribution profiles (Additional file 
1 Figure SI), the latter antibody rarely displays enrich- 
ment above the significance threshold defined by 
modENCODE. 

Another peculiar feature of the regions studied is that 
they very frequently (> 90%) encompass HI -dips (Figure 
2B, Additional file 1 Figure SI) - the regions depleted 
for histone HI [40]. This linker histone is known to be 
the key protein in compacting the 10 nm chromatin 
fiber into 30 nm super-beaded form [41]. Therefore, 
presence of Hl-dips can be considered as a marker of 
open chromatin. It is interesting to note that the trends 
observed for proteins and histone marks associated with 
open chromatin over 10 kb were essentially the same 
even over 4 kb centered at insertion points of reference 
transgenes (Figure 2). This might point to the possible 
functional interactions of said proteins in these regions 
of the genome. We next observed that 50-70% of the 
regions analyzed were also associated with HPlc, HPlb, 
JIL-1, dRING, H3K36me3 and H3K79mel. Finally, in 
the regions that correspond to interbands, in cell lines 
there was no or very little binding for typical "closed 
chromatin" (transcriptionally inert chromatin) proteins 
such as HPla, PC, HP2b, MOD(MDG4), SU(HW), E(Z), 
SU(VAR)3-7, SU(VAR)3-9, H3K9me2, H3K9me3, 
H3K27me3, H3K23ac (Figure 2). 

We then analyzed in more detail the profiles for each 
of the chromatin proteins and histone marks, for P-ele- 
ment insertions and for nucleosome-depleted regions 
within ± 5 kb from insertion sites of reference trans- 
genes in 13 interband regions. DNA sequences encom- 
passing 1.5-4 kb around these sites were considerably 
enriched in many open chromatin proteins, such as 
RNA polymerase II, CHRIZ, ORC2, GAF, BEAF-32, 
CP190, TRX, WDS, dMI-2, NURF301 and BRE1. 
Furthermore, these same regions tended to display 
lower nucleosome density and served as hot spots for P- 
element integrations (Figure 3, Additional file 1 Figure 
SI). Of the histone marks that are characteristic of 
active chromatin, the following five were most fre- 
quently (50-100%) and widely (8-10 kb) found: 
H3K4me2, H4K8ac, H3K9ac, H3K4mel and H4K16ac. 
In contrast to non-histone proteins found in active 
chromatin, the distribution of "active" histone marks is 
somewhat wider, with slight increase towards the edges 
of the sequences analyzed (Figure 3). As it was men- 
tioned above, in the interband regions studied, the 
enrichment for "inactive" marks is close to negligible; 
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Figure 2 Integrative view of chromatin proteins distribution over genomic regions of cell lines corresponding to 13 interbands from 
polytene chromosomes. A - non-histone proteins; B - histone modifications. Proteins analyzed are shown on the X axis, Y axis shows percent 
of interband regions bound by said proteins (out of 13 regions total). Colored bars correspond to percent of regions bound within 10 kb around 
insertion site; grey bars show percent of regions bound within 4 kb around reference transgenic insertions. Mean and standard deviation for 
randomly chosen genome regions and band sequences used as a control are represented by the colors black and pink, respectively. * - P-value 
(1,0E-02..1,0E-03), ** - P-value (1,0E-03..1,0E-06), *** - P-value < 1,0E-06; black and pink shaded asterisks: observed values > expected values, blue 
shaded asterisks: observed values < expected values. 



hence we failed to identify any peculiar features in their 
localization. 

Figure 4 demonstrates enrichment profiles for differ- 
ent functional classes of proteins over the regions of 
interphase chromosomes from cell lines that correspond 
to polytene chromosome interbands. Histone marks 



appear either widely enriched or uniformly distributed 
along the whole region, or slightly increasing towards 
the ends of the sequences. For most regions, non-his- 
tone proteins which mainly comprise markers of active 
chromatin are enriched over 1.5-4 kb around insertions 
sites of reference transgenes. However, in two instances, 
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Figure 3 Heat-map for protein and chromatin features found in 0.5 kb segments over 10 kb regions centered at the insertion sites of 
reference P-transposons. Percent of fragments that bind a specific protein or display nucleosome density depletion or harbor P-element 
insertions is color-coded on the bottom. Asterisk -relative content of P-insertions in the segments is shown (see Additional file 2 Table S5). 
Insertion sites of reference P-transposons are indicated by a solid vertical line. Vertical dashed lines indicate the most likely borders of interbands. 



namely in interbands 60E8/E10and 87C8/9, - these 
enrichment regions are rather found next to the refer- 
ence insertion sites. We interpret these data as the 
transgenic insertions hitting the very edge of an inter- 
band; alternatively this could be a consequence of dis- 
tinct transcriptional activities in these regions in salivary 
glands and in cell lines. 

Overall, the data presented here argue in favor of 
apparent protein-wise similarity in chromatin 



organization of 13 "true" interband regions studied in 
polytene chromosomes and of the corresponding regions 
of genome in cell lines. 

Genome-wide analysis of proteins found in interband 
regions 

To uncover the genome-wide localization characteristics 
for proteins that map to selected interband regions, we 
used GEO (Gene Expression Omnibus) datasets 
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Figure 4 Distribution of chromatin proteins over the regions corresponding to individual interbands in D. melanogaster polytene 

chromosomes. X axis shows 10 kb of a physical map for the specific region centered at the insertion site of a reference P-transposon (shown 

as a solid vertical line). Coordinates of the reference transgenic insertions are shown in Additional file 2 Table S1. Position and orientation of 

underlying genes (as in FlyBase Genes r. 5.12) is indicated below as horizontal blocks and arrows. Y axis shows combined percentages of 0.5 kb 

long DNA segments found associated with a particular class of proteins (n = 20). Color-coding for such classes is indicated below. Vertical 

dashed lines delimit the regions most probably corresponding to interbands. 
k ) 
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available as gff-files at http://www.ncbi.nlm.nih.gov/gds. 
These files describe genomic regions significantly bound 
by most of the proteins assayed by modENCODE. We 
selected fragments with positive scores for non-histone 
proteins and Hl-dips (Additional file 2 Table S3) for all 
Drosophila chromosomes and estimated their genome- 
wide distributions and lengths of the fragments. Large 
fraction (70-95%) of these fragments, bound by either 
"active" or "silent" chromatin proteins, was 1 to 3 kb 
long (Table 1). The number of fragments bound by 
"active" chromatin proteins, - RNA polll, CHRIZ, WDS, 
ORC2, Hl-dips, GAF, CP190, BEAF-32, dMI-2, 
NURF301, BRE1, TRX, -and ranging 1-3 kb, is 3000- 
5300 (Table 1), which roughly corresponds to the 
observed number of interbands in polytene chromo- 
somes [4]. On the contrary, there are far fewer frag- 
ments (760-2800) that are of similar size (1-3 kb) and 
are associated with "silent" chromatin proteins PC, E(Z), 
dRING, or with typical insulator components: CTCF 
MOD(MDG4), SU(HW) (Table 1). 

In order to estimate how frequently these proteins co- 
localize in D. melanogaster genome, we performed their 
pair-wise comparison. The number of overlapping pairs 
was considered as a similarity measure for every pair of 
factors being compared. Only the fragments that showed 
positive scores and which were smaller than 10 kb were 
considered. We calculated the number of unique paired 
overlaps between the fragments (Additional file 2 Table 



Table 1 Genome-wide analysis of the number and 
lengths of DNA fragments bound by the proteins 
represented within interband regions 



Protein 




Fragment lengths (in kb) 




1-3 


4-6 


7-9 


10-11 




Number of fragments (%) 




RNA pol II 


4760(72.6) 


1288(19.6) 


396(6.0) 


111(1.7) 


CHRIZ 


3045(67.7) 


1103(24.5) 


264(5.9) 


89(2.0) 


WDS 


4639(89.8) 


484(9.4) 


41(0.8) 


4(0.1) 


ORC2 


4178(84.4) 


651(13.1) 


108(2.2) 


16(0.3) 


H1_dip 


3543(76.3) 


796(17.1) 


237(5.1) 


68(1.5) 


PC 


1491(72.6) 


373(18.2) 


138(6.7) 


52(2.5) 


GAF 


2909(80.2) 


591(16.3) 


110(3.0) 


16(0.4) 


CP190 


4982(86.1) 


714(12.3) 


81(1.4) 


11(0.2) 


BEAF-32 


4060(88.0) 


499(10.8) 


53(1.1) 


4(0.1) 


CTCF 


1446(75.3) 


405(21.1) 


62(3.2) 


7(0.4) 


MOD(MDG4) 


843(98.9) 


9(1.1) 


0(0.0) 


0(0.0) 


Su(Hw) 


2829(69.6) 


1020(25.1) 


1 78(4.4) 


36(0.9) 


dMi-2 


3802(83.2) 


625(13.7) 


118(2.6) 


23(0.5) 


NURF301 


5336(87.0) 


680(11.1) 


101(1.6) 


13(0.2) 


BRE1 


5245(87.7) 


675(11.3) 


54(0.9) 


7(0.1) 


Trx-C 


4885(93.6) 


319(6.1) 


16(0.3) 


0(0.0) 


dRing 


2019(88.4) 


198(8.7) 


55(2.4) 


13(0.6) 


E(Z) 


760(83.6) 


113(12.4) 


27(3.0) 


9(1.0) 



S6) and so estimated the pair-wise correlation coeffi- 
cients between the proteins (Additional file 2 Table S7). 
The highest values of correlation coefficients were 
observed for the "active" chromatin proteins and for 
proteins enriched in 13 interbands, i.e. for BEAF-32, 
CHRIZ, RNA POL II, ORC2, Hl-dips, TRX, WDS, 
NURF301 and BRE1. The same was observed for "silent" 
chromatin group of proteins - MOD(MDG4), SU(HW), 
E(Z), dRING. To verify whether this co-localization is 
significant, we first fragmented the euchromatic part of 
the genome (120 Mb) into non-overlapping 3 kb-long 
blocks (the median size of fragments that are bound by 
these proteins (Table 1)). Then we analyzed each of 
these -40000 blocks for the presence of all pair-wise 
combinations of these proteins. As it is shown in Addi- 
tional file 2 Table S8, the probability of independent 
pair-wise localization of all "active" proteins in inter- 
bands studied is fairly low (P- value < 10" 300 ). Figure 5 A 
shows a multidimensional scaling plot (see Methods) of 
the correlations mentioned above. The "active" chroma- 
tin proteins characteristic of interbands cluster together 
and away from the cluster of "silent" chromatin proteins 
that do not map to interbands. 

Using the agglomerative hierarchical clustering (AHC) 
approach, we estimated the co-localization frequencies 
for all the proteins. These formed 3 separate groups 
(Figure 5B). First group comprised the "active" chroma- 
tin factors, such as BEAF-32, CHRIZ, Hl-dips, RNA 
polll, ORC2, TRX and WDS, many of which were 
reported to immunolocalize to decompacted regions of 
polytene chromosomes. It is interesting to note that the 
numbers of pair-wise overlaps for the proteins from this 
group are fairly tight, ranging from 3300 to 3800 (3600 
on average), which fits very well the number of inter- 
bands in polytene chromosomes [4]. Nucleosome remo- 
deling proteins such as NURF301, dMI-2 and GAF also 
tend to co-localize with this group. The two remaining 
groups of proteins are represented mainly by Pc-G pro- 
teins - PC, E(Z), dRING and by insulator proteins, 
MOD(MDG4), SU(HW), CTCF, CP190, and surprisingly 
by BRE1. These proteins display low levels of co-locali- 
zation frequency with the proteins from the first group, 
and so appear not to be present in interbands. 

Discussion 

Using genome-wide distribution data for a wide range of 
non-histone proteins and histone marks available for D. 
melanogaster cell lines [35,36,40], we analyzed the pro- 
tein composition and chromatin features in genomic 
regions of cell line chromosomes corresponding to 13 
interband regions of polytene chromosomes. Our results 
establish these regions as depleted for the linker histone 
HI (showing HI dips), and associated with a specific set 
of proteins characteristic of "active" chromatin (Figures 
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Figure 5 Graphic representation of co-localization extent for "interband" chromatin proteins over the entire fly genome. A 

Multidimensional scaling (MDS) plot for 18 binding factors; Horizontal and vertical axes show the degree of co-localization in conditional units. E 
- Agglomerative hierarchical clustering (AHC) of factors analyzed (XLStat Inc, http://www.xlstat.com). 



2 and 3). This is also consistent with the distribution of 
different states of chromatin in these genomic regions 
(Figure 6, Additional file 1 Figure SI). Namely of the 
five principle states of chromatin that were previously 
identified in Drosophila cell lines and color-coded by 
Filion with co-authors [35], it is predominantly RED 
chromatin that we observe most frequently within 10 kb 



fragments encompassing interbands. This chromatin is 
reported as enriched in ORC binding sites as well as in 
regulatory sequences and mainly comprises genes which 
are linked to specific processes such as "receptor bind- 
ing", "defense response", "transcription factor activity" 
and "signal transduction" [35]. The interband regions 
studied also contain YELLOW and BLUE chromatin 
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Figure 6 Distribution of various chromatin states in 13 regions of D. melanogaster genome that correspond to interbands in polytene 
chromosomes. A - 5 "colored" chromatin states according to [35]; B - 9 chromatin states according to [36]. X axis shows sizes of DNA segments 
centered at the insertion sites of reference P-transposons. Y axis shows number of regions associated with a particular type of chromatin. 



(Figure 6A). Transcriptionally active YELLOW chroma- 
tin is specifically marked with H3K36me3, a mark of 
transcriptional elongation typically present on genes 
with a broad expression pattern over many developmen- 
tal stages and tissues, so-called "house-keeping" genes. 
BLUE chromatin is mostly found in genome regions 
associated with Pc-G proteins and harboring develop- 
mental genes as well as many of the highly conserved 
non-coding elements (HCNEs) that contribute to gene 
regulation [35]. It is important to emphasize that the 
fraction of RED chromatin relatively to the rest of the 
chromatin types increases closer to the insertion sites 
marking interband regions (Additional File 3 Figure S2). 
At a 10 kb level, RED chromatin is 1.9 and 2.6 times 
enriched compared to the YELLOW and BLUE states, 
respectively, whereas when the regions ± 1 kb around 
insertion sites are considered, RED chromatin is 3.3 
time more frequent. GREEN and BLACK chromatin 
states characteristic of genetically silent material (peri- 
centric heterochromatin and transcriptionally inactive 
regions scattered over the genome, respectively) are very 
rarely found in interbands and if present tend to be 
located on the flanks (Figure 6A, Additional File 3 Fig- 
ure S2A). 

According to the 9-state model of chromatin organiza- 
tion in cell lines [36], the regions corresponding to 
interbands are mostly composed of state 1 and state 3 
chromatin (Figure 6B, Additional File 3 Figure S2B). 
State 1 chromatin is rich in promoters, TSSes and 5'- 
UTRs. State 3 chromatin is mainly characterized by the 
presence of large first introns in long genes, enrichment 
for specific chromatin remodeling factors (for instance 
SPT16 and dMI-2), presence of enhancers and early ori- 
gins of replication. As compared to states 1 and 2, state 
3 domains show stronger enrichment for transcription- 
associated histone variant H3.3 [36]. Despite some dif- 
ferences in approaches as well as in the proteins 



analyzed in [35,36], the regions that correspond to 13 
interbands display consistent set of features. They are 
mostly represented by regulatory and promoter regions 
for the genes which appear to reside in the adjacent 
compacted material of bands (chromomeres). 

Most of the "active" chromatin proteins that mapped 
in cell lines to DNA regions corresponding to inter- 
bands, are known to immunolocalize to interbands (for 
review: [4]). Therefore, it is plausible to suggest that the 
"open" chromatin feature and the localization of a speci- 
fic set of proteins are inter-related, and in fact represent 
a universal principle of interphase chromosome organi- 
zation. This conclusion is consistent with the highly 
detailed observations by W. Beermann, who compared 
banding patterns in four larval tissues of Chironomus, 
and who observed them to match perfectly except for 
minor differences at certain regions and differences due 
to puffing [2]. Similar work on Drosophila also described 
very minor changes in banding pattern [42]. Significant 
similarity in banding patterns was subsequently observed 
upon comparison of many different tissues from many 
insects (for review: [1,43]). That "active" chromatin is 
invariably present in interbands, is also supported by the 
similar pattern of DHSs in salivary gland polytene chro- 
mosomes and in embryonic cells. For instance, mapping 
of major DHSs on physical and cytological maps of the 
fa swb interband demonstrated their identical localization, 
length and number in the chromatin of embryonic cells, 
cell lines [44] and in larval cells [33]. This might help to 
explain high frequency of P-element integrations into 
interbands, as insertions tend to hit the regions of DHSs 
[32,33]. It must be emphasized that P elements trans- 
pose and integrate in diploid germline cells, and there 
are no reasons to believe that insertions sites are linked 
in any way to the gene expression nearby [45]. Within 
reference interbands, we observed P-elements to predo- 
minantly cluster around open chromatin regions (Figure 
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3), therefore this might suggest that these same DNA 
sequences are also organized in open chromatin in 
germline cells, where P-elements actually transpose and 
integrate. 

Conclusions 

Based on the genome-wide protein mapping data gener- 
ated by modENCODE on D. melanogaster cell lines, and 
using previously mapped interband regions as a refer- 
ence, we for the first time demonstrated that decom- 
pacted chromatin regions that appear as interbands in 
polytene chromosomes are organized the same way in 
other cell types and correspond to interchromomeres of 
interphase chromosomes in cell lines. The peculiarities 
of protein distribution identified for interband regions 
can serve as convenient markers to precisely map inter- 
bands to the molecular map, thereby allowing one to 
compile comparative molecular and cytogenetic maps of 
interphase chromosomes in different Drosophila cell 
types. Indeed, further experimental validation of band 
and interband regions on a larger scale should be help- 
ful to firmly establish this conclusion. Using our 
approach, precise mapping of the band/interband posi- 
tions across entire Drosophila genome is a subject of 
separate work which is currently underway. 

Methods 

Cytological Analysis of Polytene Chromosomes 

Salivary gland polytene chromosome squashes were pre- 
pared for electron microscopy analysis and examined as 
described earlier [46]. The sections with a thickness of 
120-150 nm were cut using an LKB-IV ultratome (Swe- 
den) and examined with a JEM-100C (Japan) electron 
microscope at 80 kV. Transgenic fly stocks contain 
insertions of cHBA transposon, which is an 18 kb-long 
P-transposon encompassing D. melanogaster gene rosy 
and P -gal from E. coli [47]. 

Genomic analysis 

ChlP-chip data files for chromatin proteins and histone 
modifications from Drosophila cell lines (Additional 
file 2 Table S2) were downloaded from modENCODE 
consortium website [48]. The coordinates of chromatin 
domains determined elsewhere [35] were extracted 
from NCBI Gene Expression Omnibus [49], accession 
number GSE22069. Centers of 12 interbands (dm3 
assembly) coincided with the integration sites of P 
transposons used to map respective interbands; for the 
interband 3C6/C7, proximal border of deletion fa swb 
[50] was selected as a central point. The coordinates of 
P-transposon insertion sites (Additional file 2 Table 
SI) were downloaded from FlyBase [51] (release 
FB2010_01). 



To check whether 18 proteins might cluster through- 
out the whole genome, we performed pair-wise compar- 
ison of these regions and counted the number of 
overlapping pairs as a similarity measure for every pair 
of binding regions. Only the fragments with positive 
scores shorter than 10 kb were considered (Additional 
file 2 Table S3). The formalized procedure was as fol- 
lows: Let Li = (In, ... l mi ), Lj = (ly, ... l m ) be the vectors 
representing two binding proteins i and j; i, j e [1, ... 
18], m < n are dimensions (sizes) of the vectors. We 
remove the redundant regions from L h L 2 which bind 
the same region from the counterpart vector, thereby 
obtaining the reduced sizes m\ n' of the corresponding 
vectors. We define regions 1$ and 4/ overlap if they pos- 
sess nonzero common location on DNA. Then we 

k 

define the similarity rate as r« = — — -, where h is 

mm(n f , m') 

the number of overlapping regions, and consequently 
compile similarity matrix R = {r^}. Then we apply multi- 
dimensional scaling (MDS) with XLStat add-on software 
http://www.xlstat.com for the matrix R obtained as 
described at the previous step. We used non-metric 
MDS model, where only the order of the similarities 
counts (ordinal (2)). 

Agglomerative hierarchical clustering (AHC) with the 
same metric as in MDS was used to assess non-random 
clusters in the pair-wise comparisons (XLSTAT Inc). 

To evaluate the significance of protein binding sites 
co-localization, we used chi-square test for 2 x 2 contin- 
gency table as follows. We considered the number of 
non-overlapping fragments with average length about 3 
kb in 120 Mb of the eukaryotic part of D. melanogaster 
genome, so we obtained n = 40000 fragments in total. 
Next, for each pair of proteins we calculated the contin- 
gency table, where / and m - numbers of peaks with 
positive scores for the proteins in a pair. The expected 
(theoretical) number of overlapping sites given random 
overlap model calculated for two proteins is en = l*m/n. 
This model is robust to the variance of the total seg- 
ments in the interval [40000-80000] with significance 
increasing with increasing total segments model. Thus, 
we used the value of 40000 as a conservative estimate. 

Statistical analysis 

To assess whether protein binding sites preferentially 
localize to the experimentally confirmed 13 interbands 
at a statistically significant level, we performed 13000 
random samplings of equivalent DNA chunks (4 and 10 
kb segments) across D. melanogaster genome and calcu- 
lated the number of corresponding protein binding sites 
that overlapped with the random regions. The sampling 
procedure accounts for the observed biases in chromo- 
some localization of the 13 validated interbands (one on 
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chr2R; 3 on chr3L, 4 on chr3R; and 5 on chrX) and uses 
corresponding weights when selecting random frag- 
ments from a chromosome arm. Only binding regions 
with positive scores were considered. No limitation on 
the size of a binding site has been imposed. Only single 
hits per random region were considered. We then calcu- 
lated the probability of getting a random DNA region of 
a given size equivalent to the source set (4/10 kb). Thus, 
we were able to estimate how many of the 13 randomly 
chosen fragments shall overlap with the given protein 
binding sites by chance. Also we calculated P-value of 
the observed overlap of the experimentally verified 13 
interband regions with the given sets of protein binding 
sites using Binomial test as follows: 

/// 

P{p,m,U) = J2c[ 3 P i ^-p) 13 - i , 

i=0 

where p - expected by random chance frequency of a 
given set of protein binding sites to overlap with the 
DNA region of a given size (4/10 kb), m - number of 
the observed DNA regions that overlap with the given 
protein binding site set, C\ 3 is a binomial coefficient. 

The tail of the binomial distribution to be summed up 
was chosen based on the observed number of "successes" 
m, which could be either less or more than 13*/?. In the case 
m >13% we set P' = 1-P, otherwise the original P was used. 

We similarly estimated the expected numbers of 
regions that associate with the set of proteins studied in 
13000 randomly generated DNA chunks of equivalent 
size (4 and 10 kb) from three molecularly mapped bands: 
10A1-2 (ChrX: 108000000-10980000) [38], 75C1 (Chr3L: 
18170000-18370000) and 75C2 (Chr3L: 18450000- 
18610000) [39]. The coordinates of band DNA sequences 
are shown according to FlyBase [51] (release 5.18). 

Additional material 



Additional file 2: Tables S1-S8. Supplemental Tables 1-8. Table SI 

Molecular coordinates of integration sites of P-transgenes used to map 
interbands. Table S2 Accession numbers of chromatin proteins. Table 
S3 List of proteins analyzed and number of regions with positive scores. 
Table S4 Frequencies of protein localization in 13 interband regions and 
in random DNA samplings of D. melanogaster genome and band 
sequences. Table S5 Distribution of P transposon insertions within 
interband regions. Table S6 Number of pair-wise overlaps between DNA 
fragments bound by the chromatin proteins analyzed. Table S7 Pair-wise 
correlation scores for proteins analyzed. Table S8 P-value scores for pair- 
wise correlations between DNA fragments associated with the chromatin 
proteins analyzed. 

Additional file 3: Figure S2. Frequency of chromatin states in 13 
regions of D. melanogaster genome that correspond to interbands 
in polytene chromosomes. A - 5 "colored" chromatin states according 
to [35]; B - 9 chromatin types according to [36], Sizes of DNA segments 
centered at the insertion sites of reference P-transposons (X axis); 
Percentage of DNA fragments associated with a particular type of 
chromatin calculated for each segment (V axis). 
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